GPUML: Graphical processors for speeding up kernel machines
نویسندگان
چکیده
Algorithms based on kernel methods play a central role in statistical machine learning. At their core are a number of linear algebra operations on matrices of kernel functions which take as arguments the training and testing data. These range from the simple matrix-vector product, to more complex matrix decompositions, and iterative formulations of these. Often the algorithms scale quadratically or cubically, both in memory and operational complexity, and as data sizes increase, kernel methods scale poorly. We use parallelized approaches on a multi-core graphical processor (GPU) to partially address this lack of scalability. GPUs are used to scale three different classes of problems, a simple kernelmatrix-vector product, iterative solution of linear systems of kernel function and QR and Cholesky decomposition of kernel matrices. Application of these accelerated approaches in scaling several kernel based learning approaches are shown, and in each case substantial speedups are obtained. The core software is released as an open source package, GPUML.
منابع مشابه
Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method
In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...
متن کاملKronecker Factorization for Speeding up Kernel Machines
In kernel machines, such as kernel principal component analysis (KPCA), Gaussian Processes (GPs), and Support Vector Machines (SVMs), the computational complexity of finding a solution is O(n), where n is the number of training instances. To reduce this expensive computational complexity, we propose using Kronecker factorization, which approximates a positive definite kernel matrix by the Krone...
متن کاملCascaded Execution: Speeding Up Unparallelized Execution on Shared-Memory Multiprocessors
Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl’s Law tells us that as parallelization becomes increasingly effective, any unparallelized loop becomes an increasingly dominant performance bottleneck. We present a technique for speeding up the execution of unparallelized loops by cascading...
متن کاملCommunication-Efficient Parallel Block Minimization for Kernel Machines
Kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges. In this paper, we show how to overcome the important challenge of speeding up kernel machines. In particular, we develop a parallel block minimization framework for solving kernel machines, including kernel SVM and kernel logistic regression. Our framework pro...
متن کاملReconfigurable Architecture Exploration for Speeding Up Execution of Code Generated from High-Level Specifications
Software generated from finite state machines targeted for standard embedded systems processors generally displays poor execution speeds. This paper evaluates an architecture that couples a standard processor with a reconfigurable unit in order to improve the execution speed of the generated code. A method for automatically partitioning the code is presented along with results obtained from sim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010